perf(insert-nodes): native Rust/rusqlite pipeline for node insertion#654
perf(insert-nodes): native Rust/rusqlite pipeline for node insertion#654carlos-alm wants to merge 5 commits intomainfrom
Conversation
Four root causes fixed across both engines: WASM JS/TS: extractConstantsWalk only walked root-level children, missing constants inside top-level blocks (if, try, for). Now recurses into all non-function scopes, matching native's recursive walk. WASM JS/TS: query pattern only matched property_identifier for methods, missing private #methods. Added private_property_identifier pattern. WASM Rust: extractRustParameters included self_parameter nodes that native correctly skips. Now skips them. Native JS: find_parent_of_types missed generator_function_declaration and generator_function, leaking constants from generator function bodies. Added both types to the scope filter. Native Rust: function_item handler emitted a duplicate entry for trait default-impl functions already captured by trait_item handler. Added parent guard matching WASM's existing skip logic.
|
Claude finished @carlos-alm's task —— View job Reviewing PR: fix WASM-native engine parity gap Root Cause AnalysisProblem Being Solved: This PR addresses a 26-node divergence between WASM and native parsing engines by fixing 5 specific extraction issues across both engines. Why This Approach: The approach directly fixes root causes in the parsing logic rather than masking symptoms. Each fix targets a specific extraction inconsistency. Risk Assessment: ✅ LOW RISK - The changes are surgical fixes to known issues with solid test coverage. Backlog Compliance
Code Quality Review✅ Strong Points
Detailed Fix AnalysisWASM JS/TS Fixes (
Native JS Fix ( WASM Rust Fix ( Native Rust Fix ( ✅ Test Strategy
Final AssessmentThis is exemplary engineering work. The PR:
The approach follows the project's principle: "Never document bugs as expected behavior" - instead of accepting the divergence, it systematically fixes the underlying extraction issues. Rating: ⭐⭐⭐⭐⭐ (5/5) Action: APPROVE Reasoning: This PR directly improves the core foundation of codegraph (parsing accuracy) with zero risk, excellent documentation, and proper testing. The fixes are surgical, well-reasoned, and address real extraction bugs rather than masking symptoms. |
Greptile SummaryThis PR introduces a native Rust/rusqlite pipeline for the Confidence Score: 5/5Safe to merge — the native fast-path is fully gated behind a boolean return and falls back to the unchanged JS path on any failure. Both remaining findings are P2 cleanup items (a dead
Important Files Changed
Reviews (2): Last reviewed commit: "fix(parser): prevent double-counting exp..." | Re-trigger Greptile |
src/extractors/javascript.ts
Outdated
| let declNode = child; | ||
| // Handle `export const …` — unwrap the export_statement to its declaration child | ||
| if (node.type === 'export_statement') { | ||
| const inner = node.childForFieldName('declaration'); | ||
| if (!inner) continue; | ||
| declNode = inner; | ||
| if (child.type === 'export_statement') { | ||
| const inner = child.childForFieldName('declaration'); | ||
| if (inner) declNode = inner; | ||
| } | ||
|
|
||
| const t = declNode.type; | ||
| if (t !== 'lexical_declaration' && t !== 'variable_declaration') continue; | ||
| if (!declNode.text.startsWith('const ')) continue; | ||
|
|
||
| for (let j = 0; j < declNode.childCount; j++) { | ||
| const declarator = declNode.child(j); | ||
| if (!declarator || declarator.type !== 'variable_declarator') continue; | ||
| const nameN = declarator.childForFieldName('name'); | ||
| const valueN = declarator.childForFieldName('value'); | ||
| if (!nameN || nameN.type !== 'identifier' || !valueN) continue; | ||
| // Skip functions — already captured by query patterns | ||
| const valType = valueN.type; | ||
| if ( | ||
| valType === 'arrow_function' || | ||
| valType === 'function_expression' || | ||
| valType === 'function' | ||
| ) | ||
| continue; | ||
| if (isConstantValue(valueN)) { | ||
| definitions.push({ | ||
| name: nameN.text, | ||
| kind: 'constant', | ||
| line: declNode.startPosition.row + 1, | ||
| endLine: nodeEndLine(declNode), | ||
| }); | ||
| if (t === 'lexical_declaration' || t === 'variable_declaration') { | ||
| if (declNode.text.startsWith('const ')) { | ||
| for (let j = 0; j < declNode.childCount; j++) { | ||
| const declarator = declNode.child(j); | ||
| if (!declarator || declarator.type !== 'variable_declarator') continue; | ||
| const nameN = declarator.childForFieldName('name'); | ||
| const valueN = declarator.childForFieldName('value'); | ||
| if (!nameN || nameN.type !== 'identifier' || !valueN) continue; | ||
| // Skip functions — already captured by query patterns | ||
| const valType = valueN.type; | ||
| if ( | ||
| valType === 'arrow_function' || | ||
| valType === 'function_expression' || | ||
| valType === 'function' | ||
| ) | ||
| continue; | ||
| if (isConstantValue(valueN)) { | ||
| definitions.push({ | ||
| name: nameN.text, | ||
| kind: 'constant', | ||
| line: declNode.startPosition.row + 1, | ||
| endLine: nodeEndLine(declNode), | ||
| }); | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Recurse into non-function children (blocks, if-statements, etc.) | ||
| extractConstantsWalk(child, definitions); |
There was a problem hiding this comment.
Exported constants counted twice
When child.type === 'export_statement', the code:
- Unwraps
inner = child.childForFieldName('declaration')(thelexical_declaration) - Processes
declNode = inner→ pushes the constant - Then unconditionally calls
extractConstantsWalk(child, definitions)— recursing into theexport_statementitself
During that recursion the lexical_declaration is found again as a direct child of export_statement (.child(i) returns all children including named-field children), declNode is set to it, and the constant is pushed a second time.
This only affects top-level export const X = value declarations, but every such constant will appear as a duplicate in the definitions array, producing ghost nodes in the graph.
The minimal fix at the bottom of the loop:
// Recurse into non-function, non-export-statement children
if (child.type !== 'export_statement') {
extractConstantsWalk(child, definitions);
}There was a problem hiding this comment.
Fixed — added a guard to skip recursion into export_statement nodes in extractConstantsWalk. After unwrapping and processing the inner lexical_declaration, the function now skips recursing into the export_statement itself, preventing the duplicate. See commit 9215de9.
Move the entire insert-nodes stage to Rust when the native engine is available. The new `bulk_insert_nodes` napi function receives marshalled definitions, exports, children, and file-hash data from JS and writes everything to SQLite via rusqlite in a single transaction — eliminating the JS↔better-sqlite3 FFI overhead that dominated the ~206ms timing. Rust path: prepare_cached + individual executes (zero per-row FFI cost). JS fallback preserved unchanged for WASM engine or native failure. Target: insertMs < 50 ms on native full builds.
Summary
bulk_insert_nodesnapi function incrates/codegraph-core/src/insert_nodes.rsthat writes nodes, children, containment edges, exports, and file hashes directly to SQLite via rusqlite in a single transactionallSymbolsto lean batch format and delegate all DB writes to Rust — eliminating JS↔better-sqlite3 FFI overheadTarget:
insertMs< 50ms on native full builds (current: ~206ms)Test plan